Real-world datasets often include values for many of variables. Our brains cannot efficiently process high-dimensional datasets to dervive with useful, actionable insights. In this post I will look at ways to
deal with these multi-dimensional datasets
uncover and visualize hidden patterns in the data
The three fundamental dimensionality reduction techniques that will be covered are
Principal component analysis (PCA)
Non-negative matrix factorisation (NNMF)
Exploratory factor analysis (EFA)
As a data scientist, you’ll frequently have to
frequently dealing with messy and high-dimensional datasets is the bread and butter of any data scientist. In this section, I will cover Principal Component Analysis (PCA) to effectively reduce the dimensionality of any datasets so it is easier to extract actionable insights. The motivating reason why it is important to reduce dimensionality through techiniques such as PCA is to explain as much data variation as possible while discarding highly correlated variables.
Dimensions: the number of columns in the dataset that represent features of observations
Dimensionality: the number of features (column)s characterizing the dataset
Observed vs True Dimensionality: observed features obscure the true or intrinsic dimensionality of the data.
Deal with the Curse of Dimensionality by removing redundancy.
Note: As the dimensionalities of the data grow, the feature space grows.
The data used for this analysis is the 2004 New Car and Truck data submitted by 2004 New Car and Truck Data. The data can be found at JSE Data Archive.
This data set includes features of a number of brands of cars from 2004. The first step is to explore the dataset and attempt to draw useful conclusions from the correlation matrix. Correlation reveals feature resemblance and it is useful to infer how cars are related to each other based on their features’ values. The data consist of 387 observations and 21 variables.
Screeplot and the explained variance
Explore cars with summary()
summary(cars)
## Vehicle.Name Sports.Car SUV Wagon
## Length:387 Min. :0.0000 Min. :0.0000 Min. :0.00000
## Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000
## Mode :character Median :0.0000 Median :0.0000 Median :0.00000
## Mean :0.1163 Mean :0.1525 Mean :0.07494
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :1.0000 Max. :1.0000 Max. :1.00000
## Minivan Pickup AWD RWD
## Min. :0.00000 Min. :0 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0 Median :0.0000 Median :0.0000
## Mean :0.05168 Mean :0 Mean :0.2016 Mean :0.2429
## 3rd Qu.:0.00000 3rd Qu.:0 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.00000 Max. :0 Max. :1.0000 Max. :1.0000
## Retail.Price Dealer.Cost Engine.Size Cyl
## Min. : 10280 Min. : 9875 Min. :1.400 Min. : 3.000
## 1st Qu.: 20997 1st Qu.: 19575 1st Qu.:2.300 1st Qu.: 4.000
## Median : 28495 Median : 26155 Median :3.000 Median : 6.000
## Mean : 33231 Mean : 30441 Mean :3.127 Mean : 5.757
## 3rd Qu.: 39552 3rd Qu.: 36124 3rd Qu.:3.800 3rd Qu.: 6.000
## Max. :192465 Max. :173560 Max. :6.000 Max. :12.000
## HP City.MPG Highway.MPG Weight
## Min. : 73.0 Min. :10.00 Min. :12.00 Min. :1850
## 1st Qu.:165.0 1st Qu.:18.00 1st Qu.:24.00 1st Qu.:3107
## Median :210.0 Median :19.00 Median :27.00 Median :3469
## Mean :214.4 Mean :20.31 Mean :27.26 Mean :3532
## 3rd Qu.:250.0 3rd Qu.:21.50 3rd Qu.:30.00 3rd Qu.:3922
## Max. :493.0 Max. :60.00 Max. :66.00 Max. :6400
## Wheel.Base Length Width wheeltype type
## Min. : 89.0 Min. :143 Min. :64.00 AWD: 78 Minivan : 20
## 1st Qu.:103.0 1st Qu.:177 1st Qu.:69.00 RWD:309 Pickup :234
## Median :107.0 Median :186 Median :71.00 Sports Car: 45
## Mean :107.2 Mean :185 Mean :71.28 SUV : 59
## 3rd Qu.:112.0 3rd Qu.:193 3rd Qu.:73.00 Wagon : 29
## Max. :130.0 Max. :221 Max. :81.00
Correlation matrix is a matrix of correlation coefficients. Smaller number of dimensions translates to less complex correlation matrix.
##
## Two-Step Estimates
##
## Correlations/Type of Correlation:
## Retail.Price Dealer.Cost Engine.Size
## Retail.Price 1 Pearson Pearson
## Dealer.Cost 0.9991 1 Pearson
## Engine.Size 0.5994 0.5936 1
##
## Standard Errors:
## Retail.Price Dealer.Cost
## Retail.Price
## Dealer.Cost 8.919e-05
## Engine.Size 0.03266 0.03301
##
## n = 387
##
## P-values for Tests of Bivariate Normality:
## Retail.Price Dealer.Cost
## Retail.Price
## Dealer.Cost NaN
## Engine.Size 2.952e-26 9.253e-26
## Standard deviations (1, .., p=18):
## [1] 2.663310e+04 6.336531e+02 5.404142e+02 3.388520e+01 1.054945e+01
## [6] 4.364139e+00 2.699232e+00 1.676887e+00 1.107866e+00 8.460880e-01
## [11] 3.794052e-01 3.141799e-01 2.854633e-01 2.612140e-01 2.452150e-01
## [16] 1.939418e-01 1.468818e-01 5.205969e-16
##
## Rotation (n x k) = (18 x 18):
## PC1 PC2 PC3 PC4
## Sports.Car 4.664256e-06 1.376281e-04 -1.668030e-04 2.135659e-03
## SUV 3.926409e-07 -3.200880e-04 8.640143e-05 -1.330547e-03
## Wagon -5.584797e-07 1.202712e-05 3.265437e-05 -4.483685e-04
## Minivan -5.429529e-07 -9.596557e-05 3.046913e-05 -7.436825e-04
## Pickup 0.000000e+00 2.775558e-17 -5.551115e-17 2.220446e-16
## AWD 1.596014e-06 -2.380564e-04 5.065348e-05 -1.526639e-03
## RWD 7.571334e-06 1.070398e-04 -5.763259e-05 2.224008e-03
## Retail.Price 7.404744e-01 -2.381699e-01 -6.284570e-01 -3.694206e-03
## Dealer.Cost 6.719632e-01 2.799511e-01 6.856313e-01 1.364569e-03
## Engine.Size 2.274067e-05 -9.482492e-04 2.307113e-04 7.365899e-03
## Cyl 3.654059e-05 -1.090188e-03 3.431778e-04 1.078132e-02
## HP 2.200871e-03 -2.913309e-02 7.741448e-03 9.975386e-01
## City.MPG -9.563723e-05 4.568842e-03 -1.660748e-03 -3.709087e-02
## Highway.MPG -9.904539e-05 5.458085e-03 -2.036162e-03 -3.033492e-02
## Weight 1.257923e-02 -9.293940e-01 3.672072e-01 -3.090327e-02
## Wheel.Base 5.417952e-05 -7.588386e-03 4.129063e-03 1.400151e-02
## Length 1.036728e-04 -1.243324e-02 5.034887e-03 3.470477e-02
## Width 3.921017e-05 -3.930300e-03 9.522236e-04 8.055652e-03
## PC5 PC6 PC7 PC8
## Sports.Car -9.277616e-03 7.043937e-03 -1.699490e-02 5.237911e-02
## SUV -1.307656e-02 4.383552e-03 -5.742077e-03 2.471423e-03
## Wagon -1.237310e-03 8.983243e-04 -3.028729e-03 -9.739920e-03
## Minivan 2.308389e-03 -9.682745e-04 2.106389e-02 3.450641e-02
## Pickup -1.665335e-16 -5.551115e-17 -9.020562e-17 -2.567391e-16
## AWD -1.418351e-02 5.831884e-03 -1.335326e-02 -2.516106e-02
## RWD 1.209627e-03 6.813950e-03 3.036752e-02 2.543403e-02
## Retail.Price 4.772513e-04 -5.704585e-05 6.755367e-04 -3.696670e-04
## Dealer.Cost -3.095909e-04 1.397392e-04 -6.841106e-04 4.272107e-04
## Engine.Size 1.267252e-02 9.760385e-03 -1.145745e-02 3.514177e-02
## Cyl 1.611315e-02 1.800829e-02 -5.993707e-03 8.655241e-03
## HP -3.409354e-02 -5.043907e-02 -1.111277e-02 -5.857513e-03
## City.MPG 2.838682e-02 -7.111340e-01 -1.387846e-01 -2.362369e-02
## Highway.MPG 1.106391e-01 -6.654923e-01 -1.408586e-01 6.085202e-03
## Weight -1.396935e-02 -6.629515e-03 -4.673530e-03 -2.551353e-03
## Wheel.Base 3.433947e-01 -1.543644e-01 9.250228e-01 -6.601010e-03
## Length 9.264643e-01 1.562734e-01 -3.197702e-01 -9.680310e-02
## Width 9.193759e-02 1.848764e-04 -2.808757e-02 9.916271e-01
## PC9 PC10 PC11 PC12
## Sports.Car -1.586417e-02 5.172408e-03 -1.939059e-01 9.397434e-02
## SUV 8.231795e-02 -1.991433e-02 2.179351e-01 2.824254e-01
## Wagon -3.619365e-03 -1.357449e-02 -4.658996e-02 -3.436478e-01
## Minivan -1.814420e-03 -2.876530e-02 -4.953584e-02 -6.357442e-03
## Pickup -3.478121e-16 -1.265914e-15 7.494005e-16 5.520757e-16
## AWD 1.261696e-02 -8.292133e-02 6.105930e-01 -2.544330e-01
## RWD 3.491230e-03 1.541480e-01 -7.108886e-01 -1.641387e-01
## Retail.Price 8.860378e-06 -5.209333e-05 -7.094491e-06 -3.712337e-05
## Dealer.Cost 1.785525e-06 4.737114e-05 1.656075e-05 4.631813e-05
## Engine.Size 4.662106e-02 4.095714e-01 4.584568e-02 7.600677e-01
## Cyl 7.212737e-02 8.897488e-01 1.602103e-01 -3.488091e-01
## HP 2.032004e-04 -1.140233e-02 2.996188e-04 -1.696706e-03
## City.MPG 6.833208e-01 -4.169812e-02 -3.164487e-02 -2.100510e-02
## Highway.MPG -7.172857e-01 7.018430e-02 2.716482e-02 1.967286e-02
## Weight -2.002600e-03 -3.204444e-04 -8.603018e-04 -1.699555e-04
## Wheel.Base 1.478122e-02 -7.773695e-04 2.991902e-02 8.423462e-03
## Length 5.605310e-02 -2.680689e-02 -9.281434e-03 -4.846287e-03
## Width 2.485617e-02 -3.180864e-02 4.004066e-02 -3.600757e-02
## PC13 PC14 PC15 PC16
## Sports.Car -6.826348e-02 1.341929e-02 -4.065804e-01 8.831591e-01
## SUV 1.321703e-01 5.005382e-01 4.773684e-01 2.408485e-01
## Wagon -4.052110e-01 -5.395164e-01 5.697403e-01 2.658700e-01
## Minivan 8.836167e-02 -3.675634e-01 -4.350037e-01 -2.002194e-01
## Pickup 3.094801e-15 1.514671e-15 -2.868799e-16 -4.713244e-15
## AWD -6.231493e-01 2.693695e-01 -2.988965e-01 -2.702333e-02
## RWD -4.348980e-01 4.397817e-01 -7.517149e-03 -1.843286e-01
## Retail.Price 3.173141e-05 -2.100314e-05 2.855642e-05 -1.737747e-05
## Dealer.Cost -3.305076e-05 2.083567e-05 -2.769418e-05 1.897891e-05
## Engine.Size -4.271171e-01 -2.309297e-01 1.260485e-02 -9.829235e-02
## Cyl 2.141708e-01 4.291264e-02 -2.723250e-02 7.107806e-02
## HP 8.993003e-04 5.786233e-04 8.280510e-04 -1.471095e-03
## City.MPG -4.770321e-03 -2.403292e-02 -2.984405e-02 -1.124977e-03
## Highway.MPG -5.624382e-03 3.508978e-02 2.704478e-02 5.494734e-03
## Weight 2.001988e-04 -5.533337e-05 -1.782101e-04 -1.416688e-05
## Wheel.Base -5.229849e-03 1.741793e-03 1.767598e-03 2.933672e-02
## Length -4.071112e-03 8.389702e-03 -5.870570e-03 4.480046e-03
## Width 4.326170e-03 8.945077e-03 3.198558e-02 -3.020290e-02
## PC17 PC18
## Sports.Car -7.837203e-04 4.431569e-15
## SUV -5.580682e-01 2.003898e-16
## Wagon -1.699023e-01 3.820757e-15
## Minivan -7.891813e-01 5.861165e-17
## Pickup 8.300076e-16 1.000000e+00
## AWD -5.840771e-02 9.998765e-16
## RWD -1.603820e-01 7.132248e-16
## Retail.Price 1.562904e-05 -6.711892e-19
## Dealer.Cost -1.607443e-05 1.005849e-18
## Engine.Size 5.493540e-02 1.293071e-15
## Cyl -3.863841e-02 9.606336e-16
## HP -1.368756e-03 -1.297282e-16
## City.MPG 2.575204e-02 2.291183e-16
## Highway.MPG -3.837248e-02 -2.446454e-16
## Weight 1.089482e-04 7.928040e-19
## Wheel.Base 1.381722e-02 1.935772e-16
## Length -1.107457e-02 -7.725097e-17
## Width 2.809927e-02 4.496021e-17
PCA for the 10 non-binary numeric variables of car. PCA generates 2 graphs and extracts the first 5 PCs.
- Summary of the first 100 cars
Extracting summaries of a subset of the rows in a dataset can be done with the nbelements argument.
##
## Call:
## PCA(X = cars[, 9:19], ncp = 4, graph = T)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## Variance 7.105 1.884 0.850 0.357 0.275 0.198
## % of var. 64.588 17.127 7.725 3.246 2.504 1.799
## Cumulative % of var. 64.588 81.714 89.439 92.685 95.189 96.988
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11
## Variance 0.141 0.087 0.066 0.037 0.001
## % of var. 1.277 0.788 0.604 0.336 0.007
## Cumulative % of var. 98.266 99.053 99.657 99.993 100.000
##
## Individuals (the 100 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 4.568 | -4.533 0.747 0.985 | -0.290 0.012 0.004 |
## 2 | 4.977 | -4.791 0.835 0.927 | -0.771 0.082 0.024 |
## 3 | 3.415 | -3.208 0.374 0.882 | 0.614 0.052 0.032 |
## 4 | 3.450 | -3.262 0.387 0.894 | 0.526 0.038 0.023 |
## 5 | 3.367 | -3.159 0.363 0.881 | 0.528 0.038 0.025 |
## 6 | 3.915 | -3.791 0.523 0.937 | 0.276 0.010 0.005 |
## 7 | 3.860 | -3.733 0.507 0.935 | 0.221 0.007 0.003 |
## 8 | 3.694 | -3.647 0.484 0.975 | -0.001 0.000 0.000 |
## 9 | 4.020 | -3.951 0.568 0.966 | 0.066 0.001 0.000 |
## 10 | 3.639 | -3.591 0.469 0.974 | -0.107 0.002 0.001 |
## 11 | 3.594 | -3.547 0.458 0.974 | -0.093 0.001 0.001 |
## 12 | 4.584 | -4.399 0.704 0.921 | 0.252 0.009 0.003 |
## 13 | 5.443 | -4.896 0.872 0.809 | 0.231 0.007 0.002 |
## 14 | 4.416 | -4.203 0.643 0.906 | 0.241 0.008 0.003 |
## 15 | 4.762 | -4.696 0.802 0.972 | -0.387 0.021 0.007 |
## 16 | 4.715 | -4.647 0.785 0.972 | -0.436 0.026 0.009 |
## 17 | 4.687 | -4.621 0.777 0.972 | -0.430 0.025 0.008 |
## 18 | 3.401 | -3.360 0.411 0.976 | 0.354 0.017 0.011 |
## 19 | 3.357 | -3.318 0.400 0.977 | 0.280 0.011 0.007 |
## 20 | 3.324 | -3.288 0.393 0.978 | 0.295 0.012 0.008 |
## 21 | 2.306 | -1.838 0.123 0.635 | 1.192 0.195 0.267 |
## 22 | 4.570 | -4.488 0.733 0.965 | -0.405 0.023 0.008 |
## 23 | 4.442 | -4.324 0.680 0.948 | -0.437 0.026 0.010 |
## 24 | 3.424 | -3.364 0.412 0.965 | 0.357 0.018 0.011 |
## 25 | 3.373 | -3.318 0.400 0.968 | 0.303 0.013 0.008 |
## 26 | 3.336 | -3.285 0.392 0.969 | 0.255 0.009 0.006 |
## 27 | 5.292 | -4.848 0.855 0.839 | -1.249 0.214 0.056 |
## 28 | 3.984 | -3.938 0.564 0.977 | 0.146 0.003 0.001 |
## 29 | 3.903 | -3.856 0.541 0.976 | 0.074 0.001 0.000 |
## 30 | 2.980 | -2.884 0.302 0.937 | 0.479 0.031 0.026 |
## 31 | 3.514 | -3.336 0.405 0.901 | 0.620 0.053 0.031 |
## 32 | 3.412 | -3.246 0.383 0.905 | 0.460 0.029 0.018 |
## 33 | 3.370 | -3.205 0.374 0.905 | 0.387 0.021 0.013 |
## 34 | 3.267 | -3.115 0.353 0.909 | 0.541 0.040 0.027 |
## 35 | 3.225 | -3.075 0.344 0.909 | 0.468 0.030 0.021 |
## 36 | 5.562 | -5.329 1.033 0.918 | -0.980 0.132 0.031 |
## 37 | 3.361 | -3.275 0.390 0.950 | -0.196 0.005 0.003 |
## 38 | 3.311 | -3.230 0.380 0.952 | -0.277 0.011 0.007 |
## 39 | 3.293 | -3.221 0.377 0.957 | 0.406 0.023 0.015 |
## 40 | 3.066 | -2.930 0.312 0.913 | 0.252 0.009 0.007 |
## 41 | 4.592 | -4.327 0.681 0.888 | 0.203 0.006 0.002 |
## 42 | 4.565 | -4.296 0.671 0.885 | 0.170 0.004 0.001 |
## 43 | 4.560 | -4.289 0.669 0.885 | 0.158 0.003 0.001 |
## 44 | 6.267 | -5.987 1.304 0.913 | -0.839 0.096 0.018 |
## 45 | 5.774 | -5.605 1.143 0.942 | -0.875 0.105 0.023 |
## 46 | 6.248 | -5.963 1.293 0.911 | -0.860 0.101 0.019 |
## 47 | 1.478 | -0.232 0.002 0.025 | 1.218 0.203 0.679 |
## 48 | 1.893 | -0.056 0.000 0.001 | 1.519 0.316 0.644 |
## 49 | 2.583 | -2.264 0.186 0.768 | 0.882 0.107 0.117 |
## 50 | 1.320 | -0.689 0.017 0.272 | 0.523 0.038 0.157 |
## 51 | 1.826 | -0.157 0.001 0.007 | 1.431 0.281 0.614 |
## 52 | 2.815 | -2.584 0.243 0.842 | -0.170 0.004 0.004 |
## 53 | 2.716 | -2.475 0.223 0.830 | -0.361 0.018 0.018 |
## 54 | 2.144 | -1.609 0.094 0.563 | 1.147 0.180 0.286 |
## 55 | 1.166 | -0.565 0.012 0.235 | 0.714 0.070 0.375 |
## 56 | 2.229 | 0.297 0.003 0.018 | 1.834 0.461 0.677 |
## 57 | 2.078 | -1.444 0.076 0.483 | 1.158 0.184 0.311 |
## 58 | 2.032 | -1.410 0.072 0.481 | 1.089 0.163 0.287 |
## 59 | 2.987 | -2.674 0.260 0.801 | -0.429 0.025 0.021 |
## 60 | 1.737 | -0.204 0.002 0.014 | 1.458 0.292 0.705 |
## 61 | 1.469 | 0.183 0.001 0.015 | 1.156 0.183 0.619 |
## 62 | 2.609 | -2.262 0.186 0.752 | 0.781 0.084 0.090 |
## 63 | 2.528 | -2.173 0.172 0.739 | 0.680 0.063 0.072 |
## 64 | 4.211 | -4.001 0.582 0.903 | 0.120 0.002 0.001 |
## 65 | 3.347 | -3.218 0.377 0.925 | -0.560 0.043 0.028 |
## 66 | 7.282 | -5.722 1.191 0.617 | 0.210 0.006 0.001 |
## 67 | 11.382 | -8.695 2.750 0.584 | -0.962 0.127 0.007 |
## 68 | 1.409 | -0.669 0.016 0.226 | 0.817 0.092 0.336 |
## 69 | 1.361 | -0.644 0.015 0.224 | 0.772 0.082 0.322 |
## 70 | 1.402 | -0.741 0.020 0.280 | 0.842 0.097 0.361 |
## 71 | 2.369 | -2.163 0.170 0.834 | 0.679 0.063 0.082 |
## 72 | 1.785 | -0.133 0.001 0.006 | 1.465 0.294 0.673 |
## 73 | 4.704 | -4.117 0.616 0.766 | -1.532 0.322 0.106 |
## 74 | 2.054 | -1.236 0.056 0.362 | 1.050 0.151 0.261 |
## 75 | 2.814 | -2.598 0.245 0.852 | -0.192 0.005 0.005 |
## 76 | 2.543 | -2.267 0.187 0.795 | 0.871 0.104 0.117 |
## 77 | 1.270 | -0.694 0.018 0.298 | 0.477 0.031 0.141 |
## 78 | 1.811 | 0.408 0.006 0.051 | 1.447 0.287 0.639 |
## 79 | 2.915 | -2.824 0.290 0.939 | 0.373 0.019 0.016 |
## 80 | 1.350 | -0.743 0.020 0.303 | 0.534 0.039 0.157 |
## 81 | 2.538 | -2.321 0.196 0.836 | -0.247 0.008 0.009 |
## 82 | 1.988 | -1.716 0.107 0.745 | 0.413 0.023 0.043 |
## 83 | 1.590 | -0.803 0.023 0.255 | 1.010 0.140 0.404 |
## 84 | 2.369 | -1.974 0.142 0.694 | 0.978 0.131 0.170 |
## 85 | 0.987 | -0.500 0.009 0.256 | 0.553 0.042 0.314 |
## 86 | 2.401 | -1.762 0.113 0.538 | 1.210 0.201 0.254 |
## 87 | 1.141 | -0.064 0.000 0.003 | 0.772 0.082 0.458 |
## 88 | 9.020 | -6.178 1.388 0.469 | 0.342 0.016 0.001 |
## 89 | 3.468 | -3.326 0.402 0.920 | -0.373 0.019 0.012 |
## 90 | 3.144 | -3.000 0.327 0.910 | -0.588 0.047 0.035 |
## 91 | 5.701 | -4.823 0.846 0.716 | -0.144 0.003 0.001 |
## 92 | 3.498 | -3.287 0.393 0.883 | -0.776 0.083 0.049 |
## 93 | 2.810 | -2.655 0.256 0.892 | -0.654 0.059 0.054 |
## 94 | 1.766 | -1.538 0.086 0.759 | -0.028 0.000 0.000 |
## 95 | 2.248 | -2.036 0.151 0.820 | 0.062 0.001 0.001 |
## 96 | 1.389 | -1.033 0.039 0.553 | -0.242 0.008 0.030 |
## 97 | 1.797 | 0.713 0.019 0.158 | 1.372 0.258 0.583 |
## 98 | 1.464 | 0.265 0.003 0.033 | 1.051 0.152 0.515 |
## 99 | 1.399 | 0.803 0.023 0.329 | 0.731 0.073 0.273 |
## 100 | 1.718 | 0.428 0.007 0.062 | 1.291 0.229 0.565 |
## Dim.3 ctr cos2
## 1 -0.113 0.004 0.001 |
## 2 -0.451 0.062 0.008 |
## 3 0.828 0.209 0.059 |
## 4 0.798 0.194 0.053 |
## 5 0.875 0.233 0.068 |
## 6 0.741 0.167 0.036 |
## 7 0.769 0.180 0.040 |
## 8 -0.132 0.005 0.001 |
## 9 0.280 0.024 0.005 |
## 10 -0.074 0.002 0.000 |
## 11 -0.084 0.002 0.001 |
## 12 1.187 0.428 0.067 |
## 13 2.248 1.536 0.170 |
## 14 1.267 0.489 0.082 |
## 15 -0.201 0.012 0.002 |
## 16 -0.176 0.009 0.001 |
## 17 -0.181 0.010 0.001 |
## 18 0.278 0.023 0.007 |
## 19 0.318 0.031 0.009 |
## 20 0.308 0.029 0.009 |
## 21 -0.054 0.001 0.001 |
## 22 -0.573 0.100 0.016 |
## 23 -0.767 0.179 0.030 |
## 24 -0.278 0.024 0.007 |
## 25 -0.249 0.019 0.005 |
## 26 -0.223 0.015 0.004 |
## 27 -0.309 0.029 0.003 |
## 28 0.424 0.055 0.011 |
## 29 0.461 0.065 0.014 |
## 30 0.143 0.006 0.002 |
## 31 0.471 0.067 0.018 |
## 32 0.558 0.095 0.027 |
## 33 0.598 0.109 0.031 |
## 34 0.589 0.105 0.032 |
## 35 0.628 0.120 0.038 |
## 36 0.262 0.021 0.002 |
## 37 -0.500 0.076 0.022 |
## 38 -0.456 0.063 0.019 |
## 39 -0.370 0.042 0.013 |
## 40 -0.599 0.109 0.038 |
## 41 1.474 0.661 0.103 |
## 42 1.491 0.676 0.107 |
## 43 1.498 0.682 0.108 |
## 44 1.270 0.491 0.041 |
## 45 0.654 0.130 0.013 |
## 46 1.281 0.499 0.042 |
## 47 0.342 0.036 0.054 |
## 48 0.856 0.223 0.205 |
## 49 0.564 0.097 0.048 |
## 50 0.180 0.010 0.019 |
## 51 0.822 0.205 0.203 |
## 52 -0.852 0.221 0.092 |
## 53 -0.748 0.170 0.076 |
## 54 0.135 0.006 0.004 |
## 55 0.062 0.001 0.003 |
## 56 0.764 0.178 0.118 |
## 57 -0.189 0.011 0.008 |
## 58 -0.151 0.007 0.006 |
## 59 -0.998 0.303 0.112 |
## 60 0.037 0.000 0.000 |
## 61 -0.063 0.001 0.002 |
## 62 0.828 0.209 0.101 |
## 63 0.882 0.237 0.122 |
## 64 1.210 0.445 0.083 |
## 65 -0.415 0.052 0.015 |
## 66 4.115 5.150 0.319 |
## 67 6.361 12.306 0.312 |
## 68 -0.505 0.077 0.128 |
## 69 -0.480 0.070 0.124 |
## 70 -0.461 0.065 0.108 |
## 71 0.338 0.035 0.020 |
## 72 0.119 0.004 0.004 |
## 73 -0.802 0.196 0.029 |
## 74 -0.259 0.020 0.016 |
## 75 -0.670 0.137 0.057 |
## 76 0.394 0.047 0.024 |
## 77 -0.093 0.003 0.005 |
## 78 0.567 0.098 0.098 |
## 79 0.201 0.012 0.005 |
## 80 -0.170 0.009 0.016 |
## 81 -0.836 0.212 0.108 |
## 82 -0.530 0.085 0.071 |
## 83 -0.463 0.065 0.085 |
## 84 0.592 0.107 0.062 |
## 85 0.104 0.003 0.011 |
## 86 0.712 0.154 0.088 |
## 87 0.126 0.005 0.012 |
## 88 5.561 9.403 0.380 |
## 89 -0.670 0.136 0.037 |
## 90 -0.513 0.080 0.027 |
## 91 2.558 1.990 0.201 |
## 92 -0.672 0.137 0.037 |
## 93 -0.173 0.009 0.004 |
## 94 -0.098 0.003 0.003 |
## 95 -0.095 0.003 0.002 |
## 96 -0.270 0.022 0.038 |
## 97 0.648 0.128 0.130 |
## 98 0.482 0.071 0.108 |
## 99 0.188 0.011 0.018 |
## 100 0.661 0.133 0.148 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr
## Retail.Price | 0.703 6.956 0.494 | -0.643 21.950 0.414 | 0.235 6.501
## Dealer.Cost | 0.699 6.881 0.489 | -0.645 22.104 0.416 | 0.237 6.618
## Engine.Size | 0.925 12.046 0.856 | 0.021 0.024 0.000 | 0.044 0.223
## Cyl | 0.891 11.168 0.793 | -0.107 0.609 0.011 | 0.075 0.663
## HP | 0.849 10.151 0.721 | -0.401 8.539 0.161 | 0.070 0.583
## City.MPG | -0.828 9.640 0.685 | 0.005 0.001 0.000 | 0.493 28.629
## Highway.MPG | -0.817 9.400 0.668 | 0.015 0.012 0.000 | 0.552 35.880
## Weight | 0.896 11.312 0.804 | 0.230 2.804 0.053 | -0.103 1.259
## Wheel.Base | 0.710 7.087 0.503 | 0.574 17.487 0.329 | 0.244 6.994
## Length | 0.684 6.594 0.468 | 0.561 16.680 0.314 | 0.318 11.882
## Width | 0.789 8.765 0.623 | 0.429 9.790 0.184 | 0.081 0.767
## cos2
## Retail.Price 0.055 |
## Dealer.Cost 0.056 |
## Engine.Size 0.002 |
## Cyl 0.006 |
## HP 0.005 |
## City.MPG 0.243 |
## Highway.MPG 0.305 |
## Weight 0.011 |
## Wheel.Base 0.059 |
## Length 0.101 |
## Width 0.007 |
## eigenvalue percentage of variance
## comp 1 7.1046384308 64.587622098
## comp 2 1.8839247679 17.126588799
## comp 3 0.8497282852 7.724802592
## comp 4 0.3570154894 3.245595359
## comp 5 0.2754355932 2.503959939
## comp 6 0.1979437155 1.799488322
## comp 7 0.1405192086 1.277447350
## comp 8 0.0866388119 0.787625563
## comp 9 0.0663879807 0.603527097
## comp 10 0.0369773622 0.336157838
## comp 11 0.0007903547 0.007185043
## cumulative percentage of variance
## comp 1 64.58762
## comp 2 81.71421
## comp 3 89.43901
## comp 4 92.68461
## comp 5 95.18857
## comp 6 96.98806
## comp 7 98.26550
## comp 8 99.05313
## comp 9 99.65666
## comp 10 99.99281
## comp 11 100.00000
## Dim.1 Dim.2 Dim.3 Dim.4
## Retail.Price 0.4942292 4.135222e-01 0.055242376 0.027966771
## Dealer.Cost 0.4888778 4.164186e-01 0.056233118 0.029554656
## Engine.Size 0.8558593 4.437324e-04 0.001892595 0.098541461
## Cyl 0.7934611 1.147121e-02 0.005637083 0.146141962
## HP 0.7211734 1.608659e-01 0.004957347 0.001215251
## City.MPG 0.6848793 2.134397e-05 0.243270722 0.012387933
## Highway.MPG 0.6678118 2.264843e-04 0.304880981 0.005645006
## Weight 0.8036585 5.283288e-02 0.010700651 0.005102206
## Wheel.Base 0.5034900 3.294459e-01 0.059426280 0.017390311
## Length 0.4684884 3.142384e-01 0.100967748 0.010135761
## Width 0.6227096 1.844381e-01 0.006519384 0.002934173
## Dim.1 Dim.2 Dim.3 Dim.4
## Retail.Price 6.956430 21.950039964 6.5011813 7.8334894
## Dealer.Cost 6.881107 22.103781152 6.6177765 8.2782559
## Engine.Size 12.046487 0.023553613 0.2227294 27.6014525
## Cyl 11.168213 0.608899472 0.6633983 40.9343478
## HP 10.150740 8.538871564 0.5834038 0.3403916
## City.MPG 9.639890 0.001132952 28.6292367 3.4698587
## Highway.MPG 9.399659 0.012021939 35.8798202 1.5811655
## Weight 11.311744 2.804404780 1.2593026 1.4291273
## Wheel.Base 7.086778 17.487209278 6.9935627 4.8710243
## Length 6.594120 16.679985586 11.8823570 2.8390255
## Width 8.764832 9.790099701 0.7672316 0.8218616
## $Dim.1
## $Dim.1$quanti
## correlation p.value
## Engine.Size 0.9251267 5.116179e-164
## Weight 0.8964700 3.638599e-138
## Cyl 0.8907643 6.262131e-134
## HP 0.8492193 8.059693e-109
## Width 0.7891195 1.664116e-83
## Wheel.Base 0.7095703 1.671006e-60
## Retail.Price 0.7030143 5.914584e-59
## Dealer.Cost 0.6991979 4.510056e-58
## Length 0.6844621 8.580649e-55
## Highway.MPG -0.8171975 3.651747e-94
## City.MPG -0.8275744 1.404114e-98
##
##
## $Dim.2
## $Dim.2$quanti
## correlation p.value
## Wheel.Base 0.5739738 2.730836e-35
## Length 0.5605697 2.095168e-33
## Width 0.4294626 8.445613e-19
## Weight 0.2298540 4.913548e-06
## Cyl -0.1071037 3.518485e-02
## HP -0.4010809 2.175520e-16
## Retail.Price -0.6430569 1.540011e-46
## Dealer.Cost -0.6453051 5.917390e-47
##
##
## $Dim.3
## $Dim.3$quanti
## correlation p.value
## Highway.MPG 0.5521603 2.888889e-32
## City.MPG 0.4932248 4.059874e-25
## Length 0.3177542 1.581611e-10
## Wheel.Base 0.2437751 1.212886e-06
## Dealer.Cost 0.2371352 2.389210e-06
## Retail.Price 0.2350370 2.948038e-06
## Weight -0.1034439 4.196587e-02
## comp 1 comp 2 comp 3
## 64.587622 17.126589 7.724803
## comp 1 comp 2 comp 3
## 64.58762 81.71421 89.43901
PCA allows you to specify quantitative supplementary and qualitative supplementary variables.
dudi.pca() is the main function that implements PCA for ade4 package. Set the scannf argument to FALSE and use the nf argument for setting the number of axes to retain to suppress the interactive mode and insert the number of axes within the dudi.pca() function.
## Class: pca dudi
## Call: dudi.pca(df = cars[, 8:18], scannf = FALSE, nf = 4)
##
## Total inertia: 11
##
## Eigenvalues:
## Ax1 Ax2 Ax3 Ax4 Ax5
## 7.1046 1.8839 0.8497 0.3570 0.2754
##
## Projected inertia (%):
## Ax1 Ax2 Ax3 Ax4 Ax5
## 64.588 17.127 7.725 3.246 2.504
##
## Cumulative projected inertia (%):
## Ax1 Ax1:2 Ax1:3 Ax1:4 Ax1:5
## 64.59 81.71 89.44 92.68 95.19
##
## (Only 5 dimensions (out of 11) are shown)
##
## Call:
## PCA(X = cars[, 9:19], ncp = 4, graph = T)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## Variance 7.105 1.884 0.850 0.357 0.275 0.198
## % of var. 64.588 17.127 7.725 3.246 2.504 1.799
## Cumulative % of var. 64.588 81.714 89.439 92.685 95.189 96.988
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11
## Variance 0.141 0.087 0.066 0.037 0.001
## % of var. 1.277 0.788 0.604 0.336 0.007
## Cumulative % of var. 98.266 99.053 99.657 99.993 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2
## 1 | 4.568 | -4.533 0.747 0.985 | -0.290 0.012 0.004 |
## 2 | 4.977 | -4.791 0.835 0.927 | -0.771 0.082 0.024 |
## 3 | 3.415 | -3.208 0.374 0.882 | 0.614 0.052 0.032 |
## 4 | 3.450 | -3.262 0.387 0.894 | 0.526 0.038 0.023 |
## 5 | 3.367 | -3.159 0.363 0.881 | 0.528 0.038 0.025 |
## 6 | 3.915 | -3.791 0.523 0.937 | 0.276 0.010 0.005 |
## 7 | 3.860 | -3.733 0.507 0.935 | 0.221 0.007 0.003 |
## 8 | 3.694 | -3.647 0.484 0.975 | -0.001 0.000 0.000 |
## 9 | 4.020 | -3.951 0.568 0.966 | 0.066 0.001 0.000 |
## 10 | 3.639 | -3.591 0.469 0.974 | -0.107 0.002 0.001 |
## Dim.3 ctr cos2
## 1 -0.113 0.004 0.001 |
## 2 -0.451 0.062 0.008 |
## 3 0.828 0.209 0.059 |
## 4 0.798 0.194 0.053 |
## 5 0.875 0.233 0.068 |
## 6 0.741 0.167 0.036 |
## 7 0.769 0.180 0.040 |
## 8 -0.132 0.005 0.001 |
## 9 0.280 0.024 0.005 |
## 10 -0.074 0.002 0.000 |
##
## Variables (the 10 first)
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr
## Retail.Price | 0.703 6.956 0.494 | -0.643 21.950 0.414 | 0.235 6.501
## Dealer.Cost | 0.699 6.881 0.489 | -0.645 22.104 0.416 | 0.237 6.618
## Engine.Size | 0.925 12.046 0.856 | 0.021 0.024 0.000 | 0.044 0.223
## Cyl | 0.891 11.168 0.793 | -0.107 0.609 0.011 | 0.075 0.663
## HP | 0.849 10.151 0.721 | -0.401 8.539 0.161 | 0.070 0.583
## City.MPG | -0.828 9.640 0.685 | 0.005 0.001 0.000 | 0.493 28.629
## Highway.MPG | -0.817 9.400 0.668 | 0.015 0.012 0.000 | 0.552 35.880
## Weight | 0.896 11.312 0.804 | 0.230 2.804 0.053 | -0.103 1.259
## Wheel.Base | 0.710 7.087 0.503 | 0.574 17.487 0.329 | 0.244 6.994
## Length | 0.684 6.594 0.468 | 0.561 16.680 0.314 | 0.318 11.882
## cos2
## Retail.Price 0.055 |
## Dealer.Cost 0.056 |
## Engine.Size 0.002 |
## Cyl 0.006 |
## HP 0.005 |
## City.MPG 0.243 |
## Highway.MPG 0.305 |
## Weight 0.011 |
## Wheel.Base 0.059 |
## Length 0.101 |
The following plots will identify variables contributions on the extracted principal components.
In this section I will cover how to deal with missing data using ldimensionality reduction technique called Non-negative matrix factorization (NNMF). This section will cover:
How many PCs to retain? - Kaiser-Guttman rule * Keep the PCs with eigenvalue > 1 - Scree test (constructing the screeplot) * Elbow - Parallel Analysis
The data used for this analysis is The airquality dataset the from the datasets package. This dataset contains daily air quality measurements in New York, May to September 1973. The data consist of 153 observations and 6 variables.
##
## Call:
## PCA(X = airquality)
##
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## Variance 2.318 1.165 0.983 0.790 0.435 0.310
## % of var. 38.625 19.411 16.385 13.175 7.246 5.158
## Cumulative % of var. 38.625 58.036 74.421 87.596 94.842 100.000
##
## Individuals (the 10 first)
## Dist Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3
## 1 | 2.582 | -0.570 0.092 0.049 | -1.539 1.329 0.355 | -0.229
## 2 | 2.404 | -0.663 0.124 0.076 | -0.922 0.477 0.147 | -0.437
## 3 | 2.473 | -1.536 0.665 0.386 | -1.246 0.871 0.254 | -0.834
## 4 | 3.101 | -1.536 0.665 0.245 | -2.467 3.416 0.633 | -0.148
## 5 | 3.225 | -2.191 1.354 0.462 | -1.668 1.561 0.267 | -0.136
## 6 | 2.653 | -1.948 1.071 0.540 | -1.549 1.346 0.341 | -0.368
## 7 | 2.667 | -0.947 0.253 0.126 | -2.050 2.358 0.591 | 0.257
## 8 | 3.101 | -2.668 2.008 0.741 | -0.737 0.305 0.057 | -0.302
## 9 | 4.380 | -3.841 4.161 0.769 | -0.329 0.061 0.006 | -0.874
## 10 | 1.863 | -0.679 0.130 0.133 | -1.106 0.687 0.353 | 0.455
## ctr cos2 Dim.4 ctr cos2
## 1 0.035 0.008 | -1.861 2.863 0.519 |
## 2 0.127 0.033 | -2.072 3.551 0.743 |
## 3 0.463 0.114 | -1.001 0.828 0.164 |
## 4 0.015 0.002 | -0.318 0.084 0.011 |
## 5 0.012 0.002 | -0.782 0.506 0.059 |
## 6 0.090 0.019 | -0.445 0.163 0.028 |
## 7 0.044 0.009 | -0.696 0.400 0.068 |
## 8 0.060 0.009 | -1.140 1.074 0.135 |
## 9 0.508 0.040 | -0.526 0.229 0.014 |
## 10 0.137 0.060 | -1.207 1.205 0.420 |
##
## Variables
## Dim.1 ctr cos2 Dim.2 ctr cos2 Dim.3 ctr
## Ozone | 0.828 29.610 0.686 | -0.078 0.517 0.006 | 0.295 8.877
## Solar.R | 0.385 6.402 0.148 | -0.720 44.559 0.519 | 0.167 2.821
## Wind | -0.715 22.029 0.511 | -0.178 2.719 0.032 | -0.200 4.072
## Temp | 0.866 32.341 0.750 | 0.056 0.267 0.003 | -0.126 1.612
## Month | 0.447 8.608 0.199 | 0.558 26.725 0.311 | -0.514 26.881
## Day | -0.153 1.010 0.023 | 0.542 25.212 0.294 | 0.740 55.737
## cos2 Dim.4 ctr cos2
## Ozone 0.087 | -0.082 0.854 0.007 |
## Solar.R 0.028 | 0.481 29.245 0.231 |
## Wind 0.040 | 0.493 30.782 0.243 |
## Temp 0.016 | 0.127 2.039 0.016 |
## Month 0.264 | 0.404 20.665 0.163 |
## Day 0.548 | 0.360 16.416 0.130 |
##
## Using eigendecomposition of correlation matrix.
## Computing: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
##
##
## Results of Horn's Parallel Analysis for component retention
## 180 iterations, using the mean estimate
##
## --------------------------------------------------
## Component Adjusted Unadjusted Estimated
## Eigenvalue Eigenvalue Bias
## --------------------------------------------------
## 1 2.132182 2.468840 0.336658
## --------------------------------------------------
##
## Adjusted eigenvalues > 1 indicate dimensions to retain.
## (1 components retained)
## [1] 1
## Parallel analysis suggests that the number of factors = 3 and the number of components = 1
## [1] 1
Estimation methods for PCA methods:
Mean imputation is problematic because it will distort the distribution of the variables if the data has a lot of missing values.
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
## [1] 44
## [1] 42
pca(): - Uses regression methods for approximation of the correlation matrix - Compiles PCA models Projects the new points back into the original space
## $ncp
## [1] 0
##
## $criterion
## 0 1 2 3 4 5
## 1520.506 1823.946 1771.702 2774.323 2888.306 6369.592
The dataset contains 2,225 articles from the BBC news Ibsite corresponding to stories in five topical areas from years 2004-2005. Each article is labeled with one of the following five classes: business, entertainment, politics, sport, and tech.
## [1] 86
## [1] 41
## $ncp
## [1] 4
##
## $criterion
## 0 1 2 3 4 5
## 63691268 42181039 7984657 5168754 2157407 2498253
Exploratory factor analysis (EFA) is a dimensionality reduction technique that is a natural extension to PCA. It is suggested to use EFA instead PCA when the variables are of ordinal type.
hsq contains the Humor Styles Questionnaire [HSQ] dataset, which includes responses from 1071 participants on 32 questions. The polychoric correlation was calculated using the mixedCor() function of the psych package.
## [1] 1071 39
## List of 6
## $ rho : num [1:32, 1:32] 1 -0.2094 -0.1772 -0.0945 -0.4466 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:32] "Q1" "Q2" "Q3" "Q4" ...
## .. ..$ : chr [1:32] "Q1" "Q2" "Q3" "Q4" ...
## $ rx : NULL
## $ poly :List of 4
## ..$ rho : num [1:32, 1:32] 1 -0.2094 -0.1772 -0.0945 -0.4466 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:32] "Q1" "Q2" "Q3" "Q4" ...
## .. .. ..$ : chr [1:32] "Q1" "Q2" "Q3" "Q4" ...
## ..$ tau : num [1:32, 1:6] -2.77 -2.77 -2.9 -3.11 -2.9 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:32] "Q1" "Q2" "Q3" "Q4" ...
## .. .. ..$ : chr [1:6] "1" "2" "3" "4" ...
## ..$ n.obs: int 1071
## ..$ Call : language polychoric(x = data[, p], smooth = smooth, global = global, weight = weight, correct = correct)
## ..- attr(*, "class")= chr [1:2] "psych" "poly"
## $ tetra:List of 2
## ..$ rho: NULL
## ..$ tau: NULL
## $ rpd : NULL
## $ Call : language mixedCor(data = hsq, c = NULL, p = 1:32)
## - attr(*, "class")= chr [1:2] "psych" "mixed"
hsq_polychoric <- hsq_correl$rho
H0: There is no significant difference between the correlation matrix and the identity matrix of the same dimensionality. H1: There is significant difference betweeen them and, thus, we have strong evidence that there are underlying factors.
EFA is suitable when the Bartlett sphericity test result is less than 0.05 (statistically significant).
## $chisq
## [1] 1114.409
##
## $p.value
## [1] 1.610583e-49
##
## $df
## [1] 496
The closer the value is to 1 the more effectively and reliably the reduction will be. The factorability tests suggest that I can proceed in reducing hsq dimensionality.
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = hsq_polychoric)
## Overall MSA = 0.87
## MSA for each item =
## Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15
## 0.94 0.93 0.91 0.90 0.91 0.88 0.82 0.86 0.95 0.86 0.78 0.90 0.85 0.93 0.82
## Q16 Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30
## 0.85 0.87 0.83 0.89 0.83 0.87 0.84 0.81 0.84 0.83 0.89 0.83 0.93 0.87 0.81
## Q31 Q32
## 0.81 0.91
Let’s look at another popular extraction method, Principal Axis Factoring (PAF). PAF’s main idea is that communality has a central role in extracting factors, since it can be interpreted as a measure of an item’s relation to all other items. An iterative approach is adopted. Initially, an estimate of the common variance is given in which the communalities are less than 1. After replacing the main diagonal of the correlation matrix (which usually consists of ones) with these estimates of the communalities, the new correlation matrix is updated and further replacements are repeated based on the new communalities until a number of iterations is reached or the communalities converge to a point that there is too little difference between two consecutive communalities.
hsq_correl_pa <- fa(hsq_polychoric, nfactors=4, fm="pa")
Identify variables that load well on the chosen factors
f_hsq_pa_common <- sort(hsq_correl_pa$communality, decreasing = TRUE)
f_hsq_pa_common
## Q20 Q17 Q25 Q18 Q8 Q10 Q21
## 0.6126774 0.5915220 0.5837439 0.5583484 0.5575454 0.5499162 0.5374533
## Q26 Q14 Q32 Q13 Q31 Q15 Q1
## 0.5246559 0.5189606 0.5184342 0.5011976 0.5003349 0.4972556 0.4790009
## Q12 Q5 Q2 Q4 Q29 Q7 Q6
## 0.4570093 0.4327538 0.4079485 0.4069703 0.3949128 0.3650824 0.3649526
## Q3 Q19 Q16 Q11 Q27 Q24 Q23
## 0.3634246 0.3472616 0.3225140 0.3018913 0.2992410 0.2866446 0.2719056
## Q30 Q9 Q28 Q22
## 0.2709174 0.2671010 0.2415143 0.1277128
f_hsq_pa_unique <- sort(hsq_correl_pa$uniqueness, decreasing = TRUE)
f_hsq_pa_unique
## Q22 Q28 Q9 Q30 Q23 Q24 Q27
## 0.8722872 0.7584857 0.7328990 0.7290826 0.7280944 0.7133554 0.7007590
## Q11 Q16 Q19 Q3 Q6 Q7 Q29
## 0.6981087 0.6774860 0.6527384 0.6365754 0.6350474 0.6349176 0.6050872
## Q4 Q2 Q5 Q12 Q1 Q15 Q31
## 0.5930297 0.5920515 0.5672462 0.5429907 0.5209991 0.5027444 0.4996651
## Q13 Q32 Q14 Q26 Q21 Q10 Q8
## 0.4988024 0.4815658 0.4810394 0.4753441 0.4625467 0.4500838 0.4424546
## Q18 Q25 Q17 Q20
## 0.4416516 0.4162561 0.4084780 0.3873226
## Parallel analysis suggests that the number of factors = 7 and the number of components = NA
The charts show both eigen values for principal components and principal axis factor analysis
## Parallel analysis suggests that the number of factors = 7 and the number of components = 5
## Parallel analysis suggests that the number of factors = 7 and the number of components = 5
Based on the three tests conducted, 4 factors should be retained.
This section will cover advanced applications of EFA.
## [1] "oblimin"
## [1] "promax"
## [1] "varimax"
The Varimax rotation method is most suitable for arriving at the most interpretable EFA model on the HSQ dataset. Decision on the rotation method is based on the clarity of the path diagram and the interpretability of arrow connections,
The loadings’ matrix is accessible through the loadings attribute.
##
## Loadings:
## MR1 MR2 MR4 MR3
## Q1 0.675 -0.055 -0.005 0.029
## Q2 -0.085 -0.023 0.604 -0.034
## Q3 -0.113 0.130 0.082 -0.512
## Q4 0.018 0.635 0.023 0.002
## Q5 -0.599 0.066 0.095 -0.015
## Q6 -0.257 -0.038 0.462 -0.040
## Q7 -0.166 -0.030 0.002 0.607
## Q8 -0.027 0.741 0.009 0.007
## Q9 0.485 -0.124 0.011 0.016
## Q10 0.034 0.056 0.736 -0.020
## Q11 0.139 -0.018 0.163 -0.541
## Q12 -0.142 0.663 -0.045 0.054
## Q13 -0.641 0.006 0.143 0.005
## Q14 -0.173 -0.018 0.644 0.022
## Q15 0.055 -0.044 0.078 0.688
## Q16 0.123 -0.523 0.118 0.126
## Q17 0.769 -0.035 0.043 0.050
## Q18 0.107 0.005 0.780 -0.004
## Q19 -0.122 0.141 0.229 -0.412
## Q20 0.099 0.779 -0.021 -0.082
## Q21 -0.641 0.131 0.153 0.156
## Q22 0.136 0.059 -0.267 0.105
## Q23 0.124 0.036 0.004 0.491
## Q24 0.218 0.509 0.083 0.042
## Q25 0.761 0.061 -0.020 0.007
## Q26 -0.078 0.033 0.685 0.032
## Q27 0.103 0.058 0.114 -0.530
## Q28 -0.075 0.272 0.248 -0.155
## Q29 0.607 0.182 0.068 0.151
## Q30 0.005 -0.133 0.538 -0.008
## Q31 0.107 0.066 0.080 0.693
## Q32 -0.074 0.694 0.051 0.023
##
## MR1 MR2 MR4 MR3
## SS loadings 3.768 3.241 3.233 2.688
## Proportion Var 0.118 0.101 0.101 0.084
## Cumulative Var 0.118 0.219 0.320 0.404
HSQ measures two positive features for styles of humor:
HSQ measures two negative features for styles of humor:
The extracted factors MR1 could measure the affiliative style. Thisfactor maps to most or all of the questions that correspond to the affiliative style. The classification of the questionnaire items are listed above.
The Short Dark Triad (SD3) dataset that resulted from measuring the 3 dark personality traits: - machiavellianism (a manipulative behaviour) - narcissism (excessive self-admiration) - psychopathy (lack of empathy)
Interactive version of the test: https://openpsychometrics.org/tests/SD3/
The sdt_sub_correl has been calculated with the hetcor() function of the polycor package.
## List of 7
## $ correlations: num [1:27, 1:27] 1 0.184 0.102 0.217 0.369 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:27] "M1" "M2" "M3" "M4" ...
## .. ..$ : chr [1:27] "M1" "M2" "M3" "M4" ...
## $ type : chr [1:27, 1:27] "" "Pearson" "Pearson" "Pearson" ...
## $ NA.method : chr "complete.obs"
## $ ML : logi FALSE
## $ std.errors : num [1:27, 1:27] 0 0.0969 0.0993 0.0956 0.0868 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:27] "M1" "M2" "M3" "M4" ...
## .. ..$ : chr [1:27] "M1" "M2" "M3" "M4" ...
## $ n : int 100
## $ tests : num [1:27, 1:27] 0.00 5.78e-13 1.55e-16 8.63e-14 4.36e-14 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:27] "M1" "M2" "M3" "M4" ...
## .. ..$ : chr [1:27] "M1" "M2" "M3" "M4" ...
## - attr(*, "class")= chr "hetcor"
## $chisq
## [1] 1019.442
##
## $p.value
## [1] 2.054927e-66
##
## $df
## [1] 351
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = sdt_polychoric)
## Overall MSA = 0.82
## MSA for each item =
## M1 M2 M3 M4 M5 M6 M7 M8 M9 N1 N2 N3 N4 N5 N6
## 0.78 0.84 0.80 0.66 0.91 0.84 0.68 0.77 0.79 0.80 0.82 0.83 0.87 0.85 0.84
## N7 N8 N9 P1 P2 P3 P4 P5 P6 P7 P8 P9
## 0.80 0.81 0.89 0.89 0.64 0.87 0.52 0.81 0.88 0.52 0.63 0.85
The number of factors recommended is 6.
Conduct parallel analysis for estimation with the minres extraction method and the checking the Kaiser-Guttman criterion.
## Parallel analysis suggests that the number of factors = 4 and the number of components = NA
The Kaiser-Gutman and the Scree test suggest 3 and 4 factors
A total 4 factors are extracted with the maximum likelihood estimation extraction method
##
## Loadings:
## ML1 ML4 ML2 ML3
## M1 0.005 0.043 0.578 -0.194
## M2 0.236 0.407 0.193 0.152
## M3 -0.019 0.654 0.023 0.091
## M4 0.029 0.329 0.254 -0.134
## M5 0.184 0.179 0.550 0.075
## M6 0.064 -0.099 0.849 0.055
## M7 0.104 0.171 0.438 -0.454
## M8 0.504 0.255 -0.025 -0.183
## M9 0.048 0.325 0.450 0.037
## N1 0.082 0.202 0.033 0.409
## N2 0.037 -0.160 -0.105 -0.501
## N3 0.221 0.056 0.012 0.615
## N4 -0.014 0.438 0.160 0.372
## N5 -0.059 0.580 0.107 0.166
## N6 -0.299 -0.300 0.104 -0.356
## N7 -0.189 0.346 0.222 0.219
## N8 -0.197 -0.058 -0.276 -0.334
## N9 0.754 -0.003 0.014 -0.017
## P1 0.411 0.012 0.296 0.053
## P2 0.001 -0.129 -0.089 -0.213
## P3 0.395 -0.008 0.220 0.020
## P4 0.015 0.104 -0.111 0.318
## P5 0.556 0.026 0.076 0.070
## P6 0.634 -0.047 0.174 0.139
## P7 -0.419 0.131 0.190 -0.016
## P8 0.101 0.594 -0.179 -0.277
## P9 0.261 0.525 -0.049 0.084
##
## ML1 ML4 ML2 ML3
## SS loadings 2.445 2.432 2.304 1.844
## Proportion Var 0.091 0.090 0.085 0.068
## Cumulative Var 0.091 0.181 0.266 0.334
The path diagram help with drawing conclusions about the underlying factors in the dataset.
The twenty seven statements of the short dark driad test correspond well to the three personality traits